Intrinsic Author Identification Using Modified Weighted KNN Notebook for PAN at CLEF 2013

نویسنده

  • M. R. Ghaeini
چکیده

In this paper we describe our approach and task for the PAN 2013 Author Identification. Best thing for an intelligent application is a large corpus, but when it is small, it would be helpful to extract many features from dataset and study on them. So, we extract many features from documents like: word bag, stop word bag, punctuation bag, part of speech (POS) bag and etc. It is so important that you select proper features for your learning algorithm; therefore, we decide to learn proper features too. Next, we identify author using a modified weighted KNN since it can have a good performance for a small corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vector Space Model and Overlap Metric for Author Identification Notebook for PAN at CLEF 2013

This paper describes our entry for the Author Identification task at PAN 2013. The Author Identification task was performed using a combination of Vector Space Model [1] (VSM) and Similarity Overlap Metric [3] (SOM) on the character n-grams extracted from the documents related to an author and the document of question. A combination of the VSM and SOM provided an overall F-measure, precision an...

متن کامل

Style-based Distance Features for Author Profiling Notebook for PAN at CLEF 2013

In this paper we present the approach we took in our participation to the PAN 2013 Author Identification task. It relies on a complex process to select the features which represent the author’s writing, using potentially multiple statistics and distance measures computed from the training set.

متن کامل

Style-based Distance Features for Author Verification Notebook for PAN at CLEF 2013

In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.

متن کامل

Readability for Author Profiling? Notebook for PAN at CLEF 2013

This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.

متن کامل

Local n-grams for Author Identification Notebook for PAN at CLEF 2013

Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year’s competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author’s writin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013